home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Night Owl 6
/
Night Owl's Shareware - PDSI-006 - Night Owl Corp (1990).iso
/
016a
/
love4th.zip
/
ASM&LINK.DOC
< prev
next >
Wrap
Text File
|
1991-10-01
|
18KB
|
396 lines
Third Party Assembler Interface and Linker
Traditionally in Forth systems, a "Forth Assembler" has been
included. Adding assembler components to high level language can
produce dramatic improvements in performance and capability over high
level Forth. Unfortunately these assemblers are usually written in
Forth, and have serious limitations. Often the syntax is markedly
different from the expected syntax for the particular processor. It is
usually difficult enough for most programmers to work in normal
assembler syntax, without having to learn a new one.
L.O.V.E. FORTH has been designed to use virtually any third
party assembler, using standard assembler syntax. Whenever CODE ,
;CODE or ASM is encountered, Forth calls in the third party assembler
to process the word, and links in the resulting object file, with a
built-in linker. This means that not only can normal syntax be used in
words created by the programmer, but that assembly language program
sections from other sources can be included with little or no
modification.
The authors recommend the excellent assember A86 by Isaacson,
also available as shareware. The original L.O.V.E. FORTH RPN assembler
is included with the system as source code, to be used if desired.
Operation
A small amount of set-up is required in order to configure the
system. The authors have already included configuration files for
A86, Microsoft's MASM and Borland's TASM (see Assember Set-up below).
For simple code words, like those supported by the old RPN assemblers,
use is straight forward. For example, a word to make four copies of
the top of stack:
CODE DUP4 ; ( n -- n,n,n,n )
pop ax
push ax ; push some copies
push ax
push ax
push ax
next c;
The operation NEXT above is a pre-defined macro.
There are many other powerful features of this facility, namely
the use of declarations in the assembly code. Not only can machine
code be assembled, but any other type of data, including threads,
heads, and data. Words can be defined using PUBLIC and existing words
can be referenced with EXTRN. These are all interpreted by the linker
portion of this interface.
Errors during assembly
If the assembler fails to produce an object file, an error
message is displayed, and compilation is aborted. The programmer must
then examine the error or listing file mentioned in the error message
in order to determine the problem. The file containing the code to
assemble is usually called CODE-4TH.ASM, and the file with the errors
is usually named CODE-4TH.ERR or CODE-4TH.LST.
SEGMENT Declarations
The linker supports several reserved segment and class names, for
use in directing code into various segments. These are: 'CODE', 'THREADS',
'DATA', 'HEADS', and 'STACKS'. These reserved names can either be used
as segment names (most common), or as class names. When used as segment
names any class name then specified, is ignored.
The following segments are declared automatically for the
programmer at the beginning of each assembly. The programmer need only
switch between them (eg. HEADS SEGMENT is sufficient to switch to
heads, without all the other parts of the declaration).
code segment byte public 'CODE'
code ends
threads segment word public 'THREADS'
threads ends
data segment byte public 'DATA'
data ends
heads segment byte public 'HEADS'
heads ends
stacks segment byte public 'STACKS'
stacks ends
The code segment is the default, if no other is specified,
allowing simple words to assemble with no declarations whatsoever.
There is a statement CODE SEGMENT automatically inserted before the
assembler statements, and the statements CODE ENDS and END after the
end of the assembler word. The directive:
ASSUME CS:CODE, DS:CODE, ES:CODE
is also inserted, so no segment overrides will be inserted by the
assembler, unless the programmer explicitly includes them.
Origins
When any segment is declared in an assembler, the origin is assumed
to be 0. This is fine, when the only code being dealt with is produced by
the assembler; the programmer is in complete control. Here the
code must be loaded on top of an existing program - L.O.V.E. Forth.
Therefore, the origins have been constructed to follow a slightly
different pattern.
When a reserved name is used for a segment name, the real segment
origin is at 0000 in the L.O.V.E. Forth segment. The origin (if any) given
by the programmer is incremented by HERE (or CS:HERE, TS:HERE, etc), prior
to the code being loaded in. This ensures that there are no overwritten
areas of memory. Alignment attribute is not meaningful for standard
segments, they already start on even byte, word, paragraph and page
boundaries.
Should the programmer desire an origin of 0, in the segment
being declared, a different name (unreserved) should be used. In this
case, the linker looks to the class name for direction on where to load
the code into memory. If the class name is not specified, the code is
loaded into the CODE segment. The alignment type may be specified, if
so desired. The combine type is ignored.
SEGMENT Examples
The most common declaration is:
CODE SEGMENT
which causes the code following it to be placed in the code
segment. The origin coming in from the object file (normally
0 for the first code in that segment) is incremented by the
dictionary pointer. Therefore the ORG is forced to be CS:HERE
Another more complex example is:
MYTHREADS SEGMENT WORD PUBLIC 'THREADS'
which causes the following code to be loaded into the thread
segment. The origin is relative to the start of this declared
segment.
MYSEG SEGMENT
Code/data in this segment has it's own origin of 0.
If grouped however, it has an offset from the start of the
group <=64k. It is placed in ram in one of the standard
segments (in this case the code segment)
THREADS SEGMENT byte public 'code'
The segment and class conflict - in this case the class is
ignored.
GROUP Declaration
The programmer may declare any group, that does not group different
L.O.V.E. Forth segments together (can't because >=64k apart). A
segment may be part of only one group.
EXTRN declarations
The address or value of existing Forth words may be referenced
in the assembler code, using the EXTRN declaration. Since words in
L.O.V.E. Forth have several parts, the address of each part may be
obtained, by adding a special prefix to the name desired. The prefixes
are sorted out by the linker.
Prefix Segment Purpose
Register
CODE@ (no prefix) CS address of machine code
THREADS@ DS compilation address
DATA@ ES parameter field address
HEADS@ n/a name field address
IMMEDIATE@ n/a special - executes the
following word at link-time to
obtain value
For example:
EXTRN CODE@COUNT:NEAR, DATA@TIB:BYTE, IMMEDIATE@HERE:ABS
MOV BYTE PTR ES:DATA@TIB, 0DH ; install carriage return
ADD AX,IMMEDIATE@HERE ; add HERE
JMP CODE@COUNT ; exit via a forth word
If the word appears without a prefix or if CODE@ is in front of
the word, then the address of the related machine code is returned.
This is the same as is returned with 'CODE . Similarly THREADS@
returns the compilation address of the following word. The most useful
prefix is perhaps DATA@ which returns the parameter field address, the
address returned by a VARIABLE or other word created by CREATE. HEADS@
returns the name field address. This is relative to the head segment,
the actual value of which can be obtained from the label HSEG (see
Frame Fixups below).
The word IMMEDIATE@ can execute a word at link-time. This is
typically a CONSTANT whose value is required, or a VARIABLE whose
address is required in assembly code ( eg. IMMEDIATE@BL ). It can be
any word that returns a single cell on the stack. If HERE or the other
dictionary values are referenced, they return the values they had,
prior to linking.
If using MASM the programmer must pay particular attention to
how the external references are declared. When using the reference as
a memory pointer (eg. BYTE PTR ) the reference must be declared as
:BYTE or :WORD (or other address delaration). A value used as an
immediate type operand must be declared :ABS . If mis-declared, MASM
ignores the addressing mode explicitly used in the instruction, in
favour of what is implied in the EXTRN declaration. A reference can
therefore not be used both as and immediate type operand and a memory
reference.
If using A86, the programmer need not include the EXTRN
directive, as any symbols that are undefined, are automatically
declared external. And if the EXTRN directive is used any type
declaration (:NEAR, :WORD, :ABS, etc.) may be used, A86 handles all
cases correctly.
Forth Words with Illegal Characters
When words contain characters that are illegal for the
assembler a prefix of %% may be used. This prefix is dealt with before
assembly begins, and changes the name to one acceptable for the
assembler. Illegal characters include: +-*/%^() and many more.
The word prefixed by %% must however be terminated by a space, tab or
end of line. For example:
%%-TRAILING %%+! %%2DUP
Complete example, a word which exits via */
CODE 550_337_*/ ; ( scale n by this fraction to get m ( n -- m )
extrn %%*/ :near ; reference to the word */
mov ax,550
push ax
mov ax,337
push ax
jmp %%*/ c;
PUBLIC declarations
Just as it is possible to reference Forth words from within
assember with EXTRN, it is also possible to create new words. This is
done with the PUBLIC directive. This can be used to create
multiple entry points in words, or simply to create address references
available in high level code or other code definitions. The %% prefix
described above, can be used to make names with assembler-illegal
characters. Example:
CODE QDROP ; ( q -- )
POP AX ; yes, there are more efficient ways of coding
POP AX ; this word
DDROP:POP AX
DROP: POP AX
NEXT
PUBLIC DDROP ; ( d -- )
PUBLIC DROP ; ( n -- )
c;
As shown in the table below, PUBLIC declarations work
differently, depending on which segment the label is declared in. Note
that a reference to the data segment, effectively becomes a VARIABLE .
code segment A CODE word is created
threads segment The PUBLIC address is assumed to be
the compilation address of a word
other segment A CONSTANT is created with the value
names of the PUBLIC address
A PUBLIC Caution about FORGET
Words declared PUBLIC are CREATEd at link-time. Unfortunately
most linkers do not provide PUBLIC declarations in any reasonable
order. This means that a word deleclared later, may refer to a word
lower in memory. This conflicts with FORGET which removes everything
above the forgotten word. When using forget, be sure to forget all of
the words PUBLICly CREATEd within one code word or ASM section.
The Command ASM
ASM is the best way to include a large body of assembly code
into Forth. ASM simply begins a section of assembly language code.
There is no word CREATEd like CODE , words that require access from
high level Forth or other assembler words, should be declared PUBLIC as
described above. Many code words can thus be included in one section.
Example:
ASM
code segment
BIT: ; ( access a table of bits ( n -- bit )
POP BX
ADD BX,BX
PUSH es: [BX+bittable]
NEXT
code ends
data segment
assume cs:data
bittable: dw 1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192
dw 16384,32768
data ends
PUBLIC BIT
end c;
Linking OBJect Files
The linker is automatically started after assembling a code
word with CODE ;CODE or ASM . It is also possible for the linker to
operate on existing object files. The authors may also be delivering
object file versions of utilities and upgrades in the future. The
syntax for this commmand is LINK" followed by the path and file name
of a Microsoft format OBJ file. For example:
LINK" MATRIX.OBJ"
Would link in the specified file.
Assembler Set-up
Three assemblers are currently supported directly. A86,
Microsoft MASM version 5, MASM version 6 and Borland TASM. In order to
use one of these, the configuration file must be copied to the name
ASSEMBLY.CFG, for example to use A86 type:
COPY LOVEA86.CFG ASSEMBLY.CFG
for MASM, MASM 6 and TASM, the files are LOVEMASM.CFG LOVEML6.CFG and
LOVETASM.CFG respectively. MASM version 6 takes so much memory that
the extended memory version must be used. This only works if you omit
EMM386.
If using another assembler, any of the above files can be
modified according to what the assembler needs. Read the
instructions in the CFG files (standard ascii). The following
information must be provided:
command line
input, output, listing, error files
the macro definition for NEXT
the segment declarations
lines to precede the lines parsed from CODE or ;CODE
lines to follow the lines from CODE or ;CODE
When the assembly file is created, first the macro
definition, then the segment declarations described above are inserted
into the file, along with the name of the word being assembled (if
applicable). If assembling the words CODE or ;CODE, the "line to
preceding" those parsed above are inserted, then the lines between
CODE (;CODE) and C;. The file is terminated with the "lines to
follow" from above. If the command ASM is used, the lines between ASM
and C; are inserted following the segment declarations, and the file is
terminated.
Improving performance
This method of assembly can be slow on any machine. The act of
calling another program (assembler) through DOS is time consuming
especially in disk accesses. There are two ways to speed this up:
1. Use the ASM facility to group CODE words together. The
words which would otherwise have been declared separately
will all be declared at one time, using the PUBLIC
declaration. The assembler is only invoked once per ASM
section.
2. Create a small RAM disk to include the temporary files
listed in ASSEMBLY.CFG (just change the drive and/or
directory where these are stored). For most words a size of
30k should be more than enough. The assembler itself can
also be copied to the RAM disk if it is big enough.
Frame Fixups
Frame fixups are not supported. This means that explicit references
to segments are not allowed. Keep in mind that on entry to any code word
the segment registers contain the usual segment values. In addition
there are locations defined in the CS: (CODE segment) that contain the
current addresses of the standard segments. (These are CONSTANTs).
Address contains segment value also in register
CSEG CODE CS
TSEG THREADS DS
VSEG DATA ES
SSEG STACKS SS
HSEG HEADS n/a
PSPSEG DOS program segment prefix n/a
So access to these values is via the CS register, for example
to load the VS value into DS:
MOV DS, word ptr CS: IMMEDIATE@VSEG
Why frame fixups are not supported
In order to be used interactively, any frame numbers included
in code would have to be resolved immediately on assembly. This is not
a problem, the problems occur later. When an application is SAVEd and
then re-executed at a later time, the location in memory where DOS
loads the program is often different. Relocation is supported by
DOS; the EXE file header can contain relocation items. However when
the program is SAVEd, the segment memory images are concatenated and
the result is saved in the EXE file. It is difficult to determine both
where the fixup locations are, and where they are to point to, since on
re-execution the image is expanded again. In addition before the image
is to be saved, these references would have to be de-re-located. Not
completely impossible, but difficult. Further difficulties ensue if
the program is saved as a final APPLICATION, where the program is both
saved and executed in its concatenated form.
A version of L.O.V.E. Forth in preparation is able to perform
frame fixups (the fixup information is stored as a field in each
dictionary head). When saving an application with APPLICATION" these
data are transferred to the .EXE header.